Descriptive statistics

Data Import

We have collected data from a quasi experimental study of medical students which was conducted in 2019. At that time nobody was familiar with the various meeting platforms like zoom or teams. Let us import the data to r :

require(data.table)
Loading required package: data.table
require(lsr)
Loading required package: lsr
classStudyRaw=fread(file = "virtual.csv")
who(expand = T)
   -- Name --         -- Class --   -- Size --
   classStudyRaw      data.table    49 x 18   
    $idertifier       character     49        
    $type.class       character     49        
    $pretest.score    integer       49        
    $pretest.rank     integer       49        
    $posttest.score   integer       49        
    $posttest.rank    integer       49        
    $sex              integer       49        
    $likert.1         integer       49        
    $likert.2         integer       49        
    $likert.3         integer       49        
    $likert.4         integer       49        
    $likert.5         integer       49        
    $likert.6         integer       49        
    $likert.7         integer       49        
    $likert.8         integer       49        
    $likert.9         integer       49        
    $likert.10        integer       49        
    $likert.total     numeric       49        

We can see that classStudyRaw is a data.table with a dimension of 49 x 18. There are 49 rows, curresponding 49 students and 18 variable columns. It include an identifier for each student. The type of class is in the “character” class, which we want as “factor”, because it should be the grouping variable. Likert score of 1 to 10 could be the score for each question and the researcher might have considered it as pseudo-interval scale to get a total likert score, likert.total. So we may ignore the individual questions from the analysis for the time being. Similarly we may remove pretest and post test ranks from analysis, because what we are interested in is the score.

So the final data table can be sub-setted in to 6 columns and type.class should be converted to factor.

classStudy=classStudyRaw[,c(1,2,3,5,7,18)]
classStudy$type.class=as.factor(classStudy$type.class)
who(expand = T)
   -- Name --         -- Class --   -- Size --
   classStudy         data.table    49 x 6    
    $idertifier       character     49        
    $type.class       factor        49        
    $pretest.score    integer       49        
    $posttest.score   integer       49        
    $sex              integer       49        
    $likert.total     numeric       49        
   classStudyRaw      data.table    49 x 18   
    $idertifier       character     49        
    $type.class       character     49        
    $pretest.score    integer       49        
    $pretest.rank     integer       49        
    $posttest.score   integer       49        
    $posttest.rank    integer       49        
    $sex              integer       49        
    $likert.1         integer       49        
    $likert.2         integer       49        
    $likert.3         integer       49        
    $likert.4         integer       49        
    $likert.5         integer       49        
    $likert.6         integer       49        
    $likert.7         integer       49        
    $likert.8         integer       49        
    $likert.9         integer       49        
    $likert.10        integer       49        
    $likert.total     numeric       49        

Summarise data

We have several options to summarise data in r. The base pack comes with summary() package

summary(classStudy)
  idertifier           type.class pretest.score   posttest.score  
 Length:49          physical:24   Min.   : 0.00   Min.   : 50.00  
 Class :character   virtual :25   1st Qu.:20.00   1st Qu.: 75.00  
 Mode  :character                 Median :40.00   Median : 80.00  
                                  Mean   :41.63   Mean   : 78.06  
                                  3rd Qu.:60.00   3rd Qu.: 85.00  
                                  Max.   :80.00   Max.   :100.00  
      sex         likert.total  
 Min.   :1.000   Min.   :3.100  
 1st Qu.:1.000   1st Qu.:3.500  
 Median :2.000   Median :3.700  
 Mean   :1.633   Mean   :3.665  
 3rd Qu.:2.000   3rd Qu.:3.800  
 Max.   :2.000   Max.   :4.100  

We can see that sex is another grouping variable and the labels are not given for 1 and 2. We can give male and female labels to 1 and 2 using levels() function, of course, after converting sex in to factors.

classStudy$sex=as.factor(x = classStudy$sex)
levels(classStudy$sex)=c("male","female")
classStudy$sex
 [1] male   male   female female male   male   female female female female
[11] male   male   male   female female female female female male   female
[21] female male   female female female male   female female female female
[31] male   female female male   female male   female male   female female
[41] female male   female female male   female female male   male  
Levels: male female

Lets summarise again

summary(classStudy)
  idertifier           type.class pretest.score   posttest.score       sex    
 Length:49          physical:24   Min.   : 0.00   Min.   : 50.00   male  :18  
 Class :character   virtual :25   1st Qu.:20.00   1st Qu.: 75.00   female:31  
 Mode  :character                 Median :40.00   Median : 80.00              
                                  Mean   :41.63   Mean   : 78.06              
                                  3rd Qu.:60.00   3rd Qu.: 85.00              
                                  Max.   :80.00   Max.   :100.00              
  likert.total  
 Min.   :3.100  
 1st Qu.:3.500  
 Median :3.700  
 Mean   :3.665  
 3rd Qu.:3.800  
 Max.   :4.100  

Summarise based on group

by(classStudy,INDICES = ~classStudy$type.class+classStudy$sex,FUN = summary)
classStudy$type.class: physical
classStudy$sex: male
  idertifier           type.class pretest.score   posttest.score      sex   
 Length:9           physical:9    Min.   : 0.00   Min.   :50.00   male  :9  
 Class :character   virtual :0    1st Qu.:20.00   1st Qu.:75.00   female:0  
 Mode  :character                 Median :40.00   Median :85.00             
                                  Mean   :33.33   Mean   :78.33             
                                  3rd Qu.:40.00   3rd Qu.:85.00             
                                  Max.   :60.00   Max.   :90.00             
  likert.total  
 Min.   :3.700  
 1st Qu.:3.700  
 Median :3.800  
 Mean   :3.856  
 3rd Qu.:4.000  
 Max.   :4.100  
------------------------------------------------------------ 
classStudy$type.class: virtual
classStudy$sex: male
  idertifier           type.class pretest.score   posttest.score       sex   
 Length:9           physical:0    Min.   : 0.00   Min.   : 55.00   male  :9  
 Class :character   virtual :9    1st Qu.:40.00   1st Qu.: 75.00   female:0  
 Mode  :character                 Median :40.00   Median : 80.00             
                                  Mean   :44.44   Mean   : 81.11             
                                  3rd Qu.:60.00   3rd Qu.: 90.00             
                                  Max.   :60.00   Max.   :100.00             
  likert.total  
 Min.   :3.200  
 1st Qu.:3.300  
 Median :3.500  
 Mean   :3.444  
 3rd Qu.:3.500  
 Max.   :3.800  
------------------------------------------------------------ 
classStudy$type.class: physical
classStudy$sex: female
  idertifier           type.class pretest.score   posttest.score      sex    
 Length:15          physical:15   Min.   : 0.00   Min.   :60.00   male  : 0  
 Class :character   virtual : 0   1st Qu.:20.00   1st Qu.:75.00   female:15  
 Mode  :character                 Median :40.00   Median :80.00              
                                  Mean   :37.33   Mean   :77.67              
                                  3rd Qu.:60.00   3rd Qu.:82.50              
                                  Max.   :80.00   Max.   :90.00              
  likert.total 
 Min.   :3.60  
 1st Qu.:3.70  
 Median :3.80  
 Mean   :3.82  
 3rd Qu.:3.90  
 Max.   :4.10  
------------------------------------------------------------ 
classStudy$type.class: virtual
classStudy$sex: female
  idertifier           type.class pretest.score   posttest.score      sex    
 Length:16          physical: 0   Min.   :20.00   Min.   :55.00   male  : 0  
 Class :character   virtual :16   1st Qu.:35.00   1st Qu.:68.75   female:16  
 Mode  :character                 Median :50.00   Median :80.00              
                                  Mean   :48.75   Mean   :76.56              
                                  3rd Qu.:60.00   3rd Qu.:85.00              
                                  Max.   :80.00   Max.   :90.00              
  likert.total  
 Min.   :3.100  
 1st Qu.:3.400  
 Median :3.500  
 Mean   :3.538  
 3rd Qu.:3.700  
 Max.   :4.000  

Describe function

require(psych)
Loading required package: psych
describe.by(x=classStudy,group = classStudy$type.class)
Warning in describe.by(x = classStudy, group = classStudy$type.class):
describe.by is deprecated.  Please use the describeBy function

 Descriptive statistics by group 
group: physical
               vars  n  mean    sd median trimmed   mad  min  max range  skew
idertifier        1 24 12.50  7.07   12.5   12.50  8.90  1.0 24.0  23.0  0.00
type.class        2 24  1.00  0.00    1.0    1.00  0.00  1.0  1.0   0.0   NaN
pretest.score     3 24 35.83 23.58   40.0   35.00 29.65  0.0 80.0  80.0  0.23
posttest.score    4 24 77.92 10.31   80.0   79.00  7.41 50.0 90.0  40.0 -1.08
sex               5 24  1.62  0.49    2.0    1.65  0.00  1.0  2.0   1.0 -0.48
likert.total      6 24  3.83  0.16    3.8    3.83  0.15  3.6  4.1   0.5  0.23
               kurtosis   se
idertifier        -1.35 1.44
type.class          NaN 0.00
pretest.score     -1.01 4.81
posttest.score     0.41 2.10
sex               -1.84 0.10
likert.total      -1.12 0.03
------------------------------------------------------------ 
group: virtual
               vars  n  mean    sd median trimmed   mad  min max range  skew
idertifier        1 25 37.00  7.36   37.0   37.00  8.90 25.0  49  24.0  0.00
type.class        2 25  2.00  0.00    2.0    2.00  0.00  2.0   2   0.0   NaN
pretest.score     3 25 47.20 20.72   40.0   47.62 29.65  0.0  80  80.0 -0.29
posttest.score    4 25 78.20 12.74   80.0   78.57 14.83 55.0 100  45.0 -0.44
sex               5 25  1.64  0.49    2.0    1.67  0.00  1.0   2   1.0 -0.55
likert.total      6 25  3.50  0.23    3.5    3.50  0.30  3.1   4   0.9  0.16
               kurtosis   se
idertifier        -1.34 1.47
type.class          NaN 0.00
pretest.score     -0.64 4.14
posttest.score    -0.83 2.55
sex               -1.76 0.10
likert.total      -0.78 0.05

Aggregate

meanPre=aggregate(x = classStudy,by = classStudy$pretest.score~classStudy$type.class+classStudy$sex,FUN = mean)
meanPre
  classStudy$type.class classStudy$sex classStudy$pretest.score
1              physical           male                 33.33333
2               virtual           male                 44.44444
3              physical         female                 37.33333
4               virtual         female                 48.75000
meanPost=aggregate(x = classStudy,by = classStudy$posttest.score~classStudy$type.class+classStudy$sex,FUN = mean)
meanPost
  classStudy$type.class classStudy$sex classStudy$posttest.score
1              physical           male                  78.33333
2               virtual           male                  81.11111
3              physical         female                  77.66667
4               virtual         female                  76.56250
meanLik=aggregate(x = classStudy,by = classStudy$likert.total~classStudy$type.class+classStudy$sex,FUN = mean)
meanLik
  classStudy$type.class classStudy$sex classStudy$likert.total
1              physical           male                3.855556
2               virtual           male                3.444444
3              physical         female                3.820000
4               virtual         female                3.537500

Graphing data

Post test score

hist(classStudy$posttest.score,breaks = 10,main = "Hsitogram of Posttest Score of all students",xlab = "Post test score",ylab = "Number",col = "light blue")

Pre test score

hist(classStudy$pretest.score,breaks = 5,main = "Hsitogram of Pretest Score of all students",xlab = "Pre test score",ylab = "Number",col = "light yellow")

Back to top